- Text formats and encoding
- “Text as data” - why use text?
- NLP workflow
- Regular expressions
- “Ask me anything”
November 19, 2019
Dictionary object with 1 key entry. - [posemo]: - like, like*, :), (:, accept, accepta*, accepted, accepting, accepts, active, … interests, invigor*, joke*, joking, jolly, joy*, keen*, kidding, kind, kindly, kindn*, kiss*, laidback, laugh*, legit, libert*, likeab*, liked, likes, liking, livel*, lmao*, lmfao*, lol, love, loved, lovelier, ...
library("stringr")
x <- c("apple", "banana", "pear")
str_extract(x, "an")
## [1] NA "an" NA
bananas <- c("banana", "Banana", "BANANA")
str_detect(bananas, "banana")
## [1] TRUE FALSE FALSE
str_detect(bananas, regex("banana", ignore_case = TRUE))
## [1] TRUE TRUE TRUE
. matches any character* matches no or more of the preceding character+ matches one or more of the preceding character() allow grouping[] defines character classes\p{} defines categoriesgrep() is deprecated but possible to use* means no or any characters? means any single character?quanteda::valuetypeDictionary object with 1 key entry. - [posemo]: - like, like*, :), (:, accept, accepta*, accepted, accepting, accepts, active, … interests, invigor*, joke*, joking, jolly, joy*, keen*, kidding, kind, kindly, kindn*, kiss*, laidback, laugh*, legit, libert*, likeab*, liked, likes, liking, livel*, lmao*, lmfao*, lol, love, loved, lovelier, ...
Take MY429: Quantitative Text Analysis (Lent Term)
Textual data hackathon!